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Abstract 

We consider quantile regression processes from censored data under dependent data 
structures and derive a uniform Bahadur representation for those processes. We also 
consider cases where the dimension of the parameter in the quantile regression model 
is large. It is demonstrated that traditional penalized estimators such as the adaptive 
lasso yield sub-optimal rates if the coefficients of the quantile regression cross zero. New 
penalization techniques are introduced which are able to deal with specific problems 
of censored data and yield estimates with an optimal rate. In contrast to most of 
the literature, the asymptotic analysis does not require the assumption of independent 
observations, but is based on rather weak assumptions, which are satisfied for many 
kinds of dependent data. 
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1 Introduction 

Quantile regression for censored data has found considerable attention in the recent lit- 



erature. Early work dates back to Powell (1984), Powell (1986) and Newey and Powell 



(19901) who proposed quantile regression methods in the case where all censoring variables 



are known [see also Fitzenberger (1997)]. Ying et al. (1995) introduced median regression 



in the presence of right independent censoring. Similar ideas were considered by Bang and 



Tsiatis 


(2002 


) and later 


Zhou 


(2006) 
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All these papers have in common that the statistical analysis requires the independence of 
the censoring times and covariates. Portnoy (2003) and Portnoy and Lin (2010) replaced 
this rather strong assumption by conditional independence of survival and censoring times 
conditional on the covariates. The resulting iterative estimation procedure was based on the 
principle of mass redistribution that dates back to the Kaplan-Meier estimate. An alterna- 
tive and very interesting quantile regression method for survival data subject to conditionally 
independent censoring was proposed by Peng and Huang (2008) and Huang (2010) who ex- 
ploited an underlying martingale structure of the data generating mechanism. In particular, 
in the four last-mentioned papers weak convergence of quantile processes was considered. 
This is an important question since it allows to simultaneously analyze the impact of co- 
variates on different regions of the conditional distribution. We also refer to the recent work 
of Wang and Wang (2009), Leng and Tong (2012) and Tang et al. (2012) who discussed 
quantile regression estimates that cope with censoring by considering locally weighted dis- 
tribution function estimators and employing mass-redistribution ideas. All of the references 
cited above have in common that the asymptotic analysis is rather involved and relies heav- 
ily on the assumption of independent observations. An important and natural question is, 
whether, and how far, this assumption can be relaxed. One major purpose of the present 
paper is to demonstrate that a sensible asymptotic theory can be obtained under rather weak 
assumptions on certain empirical processes that are satisfied for many kinds of dependent 
data. We do so by deriving a uniform Bahadur representation for the quantile process. In 
some cases, we also discuss the rate of the remainder term. 

The second objective of this paper deals with settings where the dimension of the parameter 
of the quantile regression model is large. In this case the estimation problem is intrinsically 
harder. Under sparsity assumptions penalized estimators can yield substantial improvements 
in estimation accuracy. At the same time, penalization allows to identify those components 
of the predictor which have an impact on the response. In the uncensored case, penalized 



quantile regression has found considerable interest in the recent literature [see Zou and Yuan 



(2008), Wu and Liu (2009) and Belloni and Chernozhukov (2011) among others]. On the 
other hand - to the best knowledge of the authors - there are only three papers which discuss 
penalized estimators in the context of censored quantile regression. Shows et al. (2010) 
proposed to penalize the estimator developed in Zhou (2006) by an adaptive lasso penalty. 
These authors assumed unconditional independence between survival and censoring times 
and considered only the median. Wang et al. (2012) proposed to combine weights that 
are estimated by local smoothing with an adaptive lasso penalty. The authors considered 
a model selection at a fixed quantile and did not investigate process convergence of the 
corresponding estimator. 

In contrast to that, Wagener et al. (2012) investigated sparse quantile regression models 



and properties of the quantile process in the context of censored data. As Shows et al. 



(2010), these authors assumed independence of the censoring times and predictors, which 



2 



may not be a reasonable assumption in many practical problems and moreover might lead 



Koenker 


(2008 


) and 


Portnoy 


(2009) 



more important point reflecting the difference between the philosophy of quantile versus 
mean regression was not considered in the last-named paper. In contrast to mean, quantile 
regression is concerned with the impact of predictors on different parts of the distribution. 
This implies that the set of important components of the predictor could vary for different 
quantiles. For example, it might be possible that a certain part of the predictor has a strong 
influence on the 95%-quantile of the distribution of the response, while a different set relates 
to the median. Also, quantile coefficients might cross zero as the probability for which the 
quantile regression is estimated varies. Traditional analysis of penalized estimators, including 
the one given in Wagener et al. (2012), fails in such situations. At the same time, it might 
not be reasonable to exclude covariates from the model just because they have zero influence 
at a fixed given quantile. All those considerations demonstrate the need for penalization 
techniques that take into account the special features of quantile regression. To the best of 
our knowledge, no results answering these questions are available in the context of censored 
quantile regression. 

Therefore the second purpose of the present paper is to construct novel penalization tech- 
niques that are flexible enough to deal with the particular properties of censored quantile 
regression, and to provide a rigorous analysis of the resulting quantile regression processes. 
One major challenge for the theoretical analysis of censored regression quantiles in the present 
setting is the sequential nature of the underlying estimation procedures. While in other set- 
tings estimators for different quantiles do not interact, the situation is fundamentally different 
in the case of censored data when iterative procedures need to be applied. In the course 
of our analysis, we demonstrate that using traditional generalizations of concepts from the 
mean regression setting can result in sub-optimal rates of convergence. As a solution of 
this problem we propose penalties that avoid this problem and additionally allow to analyze 
the impact of predictors on quantile regions instead of individual quantiles. Finally, all our 
results hold for a wide range of dependence structures thus considerably extending the scope 
of their applicability. 



The remaining part of the paper is organized as follows. The basic setup is introduced 
in Section [2j In Section [3j we concentrate on the properties of the unpenalized estimator 
in settings where the realizations need not be independent and derive a uniform Bahadur 
representation. Various ways of penalizing the censored quantile process and the properties 
of the resulting estimators are discussed in Section |4j A small simulation study illustrating 
the findings in this section is presented in Section |5j Finally, all proofs and technical details 
are deferred to an appendix in Section [6j 
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2 Censored quantile regression 



We consider a censored regression problem with response T i: predictor Zj and censoring 
time Cj, where the random variables Tj and Cj may be dependent, but conditionally on 
the ci-dimensional covariate Z, the response Tj and the censoring time C{ are independent. 
As usual we assume that instead of Tj we only observe Xj = Ti A Cj, and the indicator 
5j = /{Xj = Tj}. Let {Tj, Cj, Zj}™ =1 denote n identically distributed copies of the random 
variable (Ti, C\, Zx). The aim consists in statistical inference regarding the quantile function 
of the random variable T conditional on the covariate vector Z on the basis of the sample 
{Xi, Zj, 6i}f =1 . In particular we would like to study the influence of the components of the 



Portnoy 


(2003 


) and 


Peng 



and Huang (2008), we assume that the conditional quantile functions of T are linear in Z, 



i.e. 



Q T (T\Z) := inf{t : P{T < t\Z) > r} = Z*/3(r) (2.1) 

for r G [r^Tf/] C [0, 1). Combining ideas from the above references, an estimator for the 
coefficient function /3(r) re [ rLjT(7 ] can be constructed in an iterative manner. To be precise, 
consider a uniformly spaced grid 

T L < Ti < ... < 7jv T (n) = TV (2-2) 

with width a n = o(n -1 / 2 ) and set b n := a n /(l — Tjj). The estimator for /3(r) is now defined 



as a piecewise constant function. We follow Portnoy (2003) by assuming that there is no 
censoring below the r L 'th quantile where t l > 0. Setting r = t l , the estimator (3(r L ) is 



defined as the classical Koenker and Bassett (1978) regression quantile estimator without 



taking censoring into account. For j = 1, . . . , iV T ( n ) the estimator $(tj) of (3(rj) is then 
sequentially defined as any value from the set of minimizers of the convex function 



HAb) :-. 



5,- IX- - Z*b| - Z*b 5, -2 



I{X l >Z t Mu)}dH{u)-2 n 



(2.3) 



Here H{u) := — log(l — u) and /3(r) is defined as constant and equal to /3(rj) whenever 
r G [rj, Tj + i). The convexity of H greatly facilitates the computation of the estimators. In 
particular the computation of the directional derivative of the function Hj at the point b in 
direction of £ yields 



^(b,£) = — JVzJJViCZjb) - / I{Xi > Z^(u)}dH(u)-r ) (2.1) 



i=l 
n 



where A^(t) := 5jJ{Xj < t} and sgn(a) := A if a 7^ with sgn(0) := 0. We thus obtain 
that any minimizer b of the function H defined in (2.6) satisfies the condition 



inf^(b,£)>0. 



(2.5) 
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The first major contribution of the present paper consists in replacing the i.i.d. assumption 
that underlies all asymptotic investigations considered so far by general conditions on certain 
empirical processes. In particular, we demonstrate that these conditions are satisfied for 
a wide range of dependency structures. Moreover, instead of providing results on weak 
convergence, we derive a uniform (weak) Bahadur representation that can be used as starting 



point for the investigation of general L-type statistics [see e.g. Portnoy and Koenker (1989) 



and rank-based testing procedures [see Gutenbrunner et al. (1993) 



Remark 2.1 Peng and Huang (2008) studied a closely related estimate. More precisely 
these authors proposed to set /3(0) := defined their estimator for /3(r 3 -) as the iterative 
(generalized) solution of the equations 



E 



Z,[Ni(Z%) 



I{X t > Z\j3{u)}dH{u)) « 



Note that this corresponds to the first line in the definition of ^ in equation (2.4). In the 



case when the X{ have a continuous distribution, the second line in the definition of tyj is of 
order Op(l/n) uniformly with respect to b. Therefore (under this additional assumption) 
this part is negligible compared to the rest of the equation and the proposed estimator can 
thus be viewed as the solution of the estimating equation 



E 



ZiiNMb) 



I{Xi>^(u)}dH(u)-T ) ^0 



[T0,Tj) 



which corresponds to the one considered by Peng and Huang (2008) if we set tq = 0. 



Remark 2.2 It is possible to show that in the case with no censoring up to a quantile T£, 
the estimator starting at tl and the version starting at To = considered by |Peng and 



Huang (2008) share the same limiting behavior. However, we would like to point out that, 
in order for the estimator starting at r = to be well-behaved, conditions controlling all 
the lower part of the conditional distribution of the survival time given the covariates need 
to be imposed. Obviously, no such assumptions are necessary for the version starting at tl, 
and for this reason this version seems to be preferable in cases where there is no censoring 
below a certain quantile. 

It is well known that in models with insignificant coefficients penalization of the estimators 
can yield significant improvements in the estimation accuracy. At the same time, this method 
allows for the identification of the components of the predictor which correspond to the non- 
vanishing components of the parameter vector. The second part of our paper is therefore 
devoted to considering penalized versions of the estimator described above. Penalization 
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can be implemented by adding an additional term to the estimating equation in (2.3). More 
precisely, we propose to define 

1 d 
P(t ) := argmin - V"p T0 (X; - b*Zj) + \ n } \h k \/p k (n, r ) 



fe=i 



and replace the function Hj in (2.3) by 



#i(b) 



Zjbl -Z!b & -2 



/{Xi > Z^(u)}dtf (u) - 2r (2.6) 



+2X n y^ j \b k \/p k (n,T j ). 



k=l 

Here, the quantity p(n,Tj) = (pi(n,Tj), . . . ,pd(n,Tj)) denotes a ci-dimensional vector that, 
together with A n , controls the amount of penalization and is allowed to depend on the data. 
A very natural choice is given by a version of the adaptive lasso [see Zou (2006)], that is 

Phin^Ti) = \h{rj)\ (2.7) 

(k = 1, . . . , d) where $(t) is some preliminary estimator for the parameter 0(r). A detailed 
discussion of estimators based on this penalization will be given in Section 4^ In particular, 
we will demonstrate that in certain situations the adaptive lasso can lead to non-optimal 
convergence rates. Alternative ways of penalization that avoid this problem will be discussed 
in Section l4~2l 



Remark 2.3 Note that we also allow the choice pf.{n,Tj) = oo throughout this paper if it 
is not stated otherwise. By this choice we do not use a penalization for the /c'th component, 
which would be reasonable if a variable is known to be important. For example, it is reason- 
able not to penalize the component of (5 corresponding to the intercept since it will typically 
vary across quantiles and thus be different from zero. 



3 A Bahadur representation for dependent data 

For the asymptotic results, we will need the following notation and technical assumptions 
which are collected here for later reference. Consider the conditional distribution functions 

F(t\z) : = P(X <t\z), F(t\z) := P(X <t,5 = l\z) 

and denote by f(t\z), f(t\z) the corresponding conditional densities. Define the quantities 

//(b) := E[ZI{X < Z t h,5 = 1}], /1(b) := E[ZI{X > Z*b}] (3.1) 

u n (b) :=-J2 Z,iV,(Z*b) - /i(b), u n (b) := - V Z,/{X, > Z*b} - jl(b). (3.2) 

i i 

We need the following conditions on the data-generating process. 
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(CI) The model contains an intercept, that is = 1 a.s. for % = 1, ...,n and there exists 
a finite constant Cz > such that ||Z|| < Cz a.s. [here and throughout the paper, 
denote by || • || the maximum norm]. 

(C2) There exist a finite constant C4 such that 

11/3(^-/3(^)11 <C A \n-T 2 \ 

for all ri,r 2 G [t l ,tu]. 
(C3) Define the set B(T,e) := {bei rf : inf reT ||b - /3(r)|| < e}. Then 

sup sup f(z f b\z) =: < 00, sup sup/(z*b|z) =: < 00 

beB(T,£) z ' ' beB{T,s) z 

Moreover /, / are uniformly continuous on {b*z : b G B(T, e), z G 2} x £ with respect 
to both arguments and uniformly Holder continuous with respect to the first argument, 
i.e. for some 7 > and Hf, Hf < 00 



sup sup \f(z t bi\z) - f(z t b 2 \z)\ < Hf\\bt - b 2 | 

bi,b 2 6B(T,e) z 

sup sup \f(z t b 1 \z) - /(^ t b 2 |^) I < ^/||bi - b 2 | 

bi,b 2 eB(T,e) z 
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(C4) We have 

where X m i n (A) denotes the smallest eigenvalue of the matrix A. 



inf A min (E[(ZZV(Z'b|Z)]) =: A > 

beB(T,e) 



Remark 3.1 Condition (CI) has been imposed by all authors who considered model (2.1). 
While it possibly could be relaxed, this would introduce additional technicalities and we 
therefore leave this question to future research. Conditions (C2),(C3) place mild restrictions 
on the regularity of the underlying data structure. Condition (C4) is similar to condition 



(C4) in Peng and Huang (2008). It yields an implicit characterization of the largest quantile 



that is identifiable in the given censoring model. For a more detailed discussion of this point, 



we refer the interested reader to Section 3 of Peng and Huang (2008). 



In contrast to most of the literature in this context which requires independent observations, 
our approach is based on a general condition on certain empirical processes which holds for 
many types of dependent data. More precisely, we assume the following conditions. 



(Dl) With the notation (3.2) we have 

sup |K(b)|| + sup ||z> n (b)|| = o P (l) 
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(D2) For some e > define B := {b : inf re [ TL >Tu j ||b — P(t)\\ < e} and for a function g on B 
define 

w B (ff) := sup ||<7(bi) -p(b 2 )|| 



Then the empirical processes (A/n^ n (b))beB an d (v / ^- z/ n(b))beB satisfy for any a n = o(l) 

UaniVnVn) = op(l), u an (Vni>n) = o P {\). 

(D3) The process 

n » 

70 ^(Z, - EZ,) - u n (P(s)) + / v n {P{u))dH{u) 



n . 

?=1 



indexed by s G [tl, r^] converges weakly towards a centered Gaussian process W. 

First of all, we would like to point out that for independent data, conditions (D1)-(D3) follow 
under (CI) and (C3) and in this case 

UaniVnVn) +^a n (Vni>n) = Op ((a n log n) 1/2 V (rT 1 \ogn) l/2 ). 

We now provide a detailed discussion of results available in settings where the independence 
assumption is violated. To this end, note that ^ n ,it(b) = / ghdP n — E[^b(Z, X, 5)] where 
gb(z, x,8) := Zkl{x < z t h}5 and P n denotes the empirical measure of the observations 
(Xj, Zj, 5j)j =1 . . Thus for any set B C W 1 the process (A/ra^ n ,fc(b))bes can be interpreted 
as empirical process indexed by the class of functions {<7b|b G B}. 



Remark 3.2 Combining Lemma 2.6.15 and Lemma 2.6.18 from van der Vaart and Wellner 
(1996) shows that {#b|b G M. d } is VC-subgraph [see Chapter 2.6 in the latter reference for 



details], and under assumption (CI) all functions in this class are uniformly bounded. Similar 
arguments apply to v n ^(h). The problem of uniform laws of large numbers for VC-subgraph 
classes of functions for dependent observations has been considered by many authors. A 



good overview of recent results can be found in Adams and Nobel (2010) and the references 
cited therein. In particular, the results in the latter reference imply that (Dl) holds as soon 
as (Xi, Zj, 8i)i e z is ergodic, (CI) is satisfied and the conditional distribution function of X 
given Z, i.e. F, is uniformly continuous with respect to the first argument. 

Remark 3.3 Condition (D2) essentially imposes uniform asymptotic equicontinuity of the 
processes n 1//2 z/ n , n 1 / 2 ^. It is intrinsically connected to weak convergence of those processes. 



More precisely, Theorem 1.5.7, Addendum 1.5.8 and Example 1.5.10 in van der Vaart and 



Wellner (1996) imply that (D2) will hold as soon as the processes n 1//2 z/ n , n l / 2 v n converge 
weakly towards centered Gaussian processes, say V, V, with the additional property that 
E[(V(bi) - V(b 2 )) 2 ] = o(l) implies ||bi - b 2 || = o(l). Condition (D2) can thus be checked 
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by establishing weak convergence of n l l 2 v n , n l ^ 2 u n and considering the properties of their 
covariance. The literature on weak convergence of processes indexed by certain classes of 
functions in dependent cases is rather rich. 



Specifically, with the notation from Remark 3.2, it is possible to show that under assumption 



(C3) the bracketing numbers [see Definition 2.1.6 in van der Vaart and Wellner (1996)] of 
the class Q := {gt,|b G B(T,e)} satisfy A/j ](e, Q, Px,z,s) < ce~ d for some finite constant c. 



Thus, among many others, the results from Arcones and Yu ( 1994 ) for (3- mixing, the results 



from Andrews and Pollard (1994) for a- mixing and the results from Hagemann fl2012) for 
data from general non-linear time series models can be applied to check condition (D2). 



For example, the results in Arcones and Yu (1994) imply that (D2) will hold as soon as 
(Zj, Ti, Ri)iez is a strictly stationary, (3— mixing sequence with coefficients (3% = 0{k~ r ) for 
some r > 1. 



We now are ready to state the main result of this section. 

Theorem 3.4 Assume that tq = tl > 0, that for some a > we have P(C > Z*/3(ro + a)) 
1 and let assumptions (C1)-(C4), (Dl)-(DS) hold. Then the representation 

(3{s)-(3{s) = ^\(3{s)))- l (w n {s)- [ (7T (l d + MldH{v)\) M u w n {u)dH{u)) + R n ( 

V Jfrn.s) V M V J J J 



(3.3) 

holds uniformly in s G [tl,t~u] where M u = (n'(l3(s)))~ 1 fl'(f3(u)), 7T denotes the product- 



integral [see Gill and Johansen (1990)], and for any c n — > oo the remainder R n (s) satisfies 



sup y/n\\R n (r)\\ = P (n 1/2 b n + n 7/2 + u Cnn -i/ 2 (y/nv n ) + u Cnn -i/2(y/nv n )) 

re [t l ,Tu] 

In particular, this implies 



(3.4) 



in the space D([tl, tu}) equipped with the supremum norm and ball sigma algebra [see Pollard 



(1984)]. Here V T0 denotes centered Gaussian processes given by 



V T0 (r) = W(r) 



[ro,r) V M 



7T (h + MldH^y^M^^dHiu). 

(u,t\ V / / 



The uniform Bahadur representation derived above has many potential applications. For 



example, it could be used to extend the L-statistic approach of Koenker and Portnoy (1987), 



the rank tests of Gutenbrunner et al. (1993), or the confidence interval construction of Zhou 



and Portnoy (1996) to the setting of censored and/or dependent data. We conclude this 
section by discussing some interesting special cases and also possible extensions of the above 
result. 
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Remark 3.5 In the case of independent data, standard arguments from empirical process 
theory imply 



-1/2 



[y/nv n ) + u Cnn ~i/2(y/nv n ) = P (n 1/4 (c n logn) 



1/21 



Since c n can converge to infinity arbitrarily slowly, this shows that the remainder in (3.4) is 
of order Op(b n + n" 1 ^ 2 + n~ 3 / 4 (logn) 1 / 2 ). In particular, for 7 > 1/2 and b n = 0(n~ 3 / 4 ) we 
obtain the same order as in the Bahadur representation of classical regression quantiles, see 



e.g. Koenker and Portnoy (1987). 



Remark 3.6 If only conditions (Dl) and (C1)-(C4) hold, the proofs of the result yield 
uniform consistency of the proposed quantile estimators. If the o p (l) in condition (Dl) can 
be replaced by a rate Op(r n ) with r n tending to zero not faster then n~ 1//2 , it is again possible 
to show that the censored regression quantiles converge uniformly with rate Op{b n + r n ). 



Remark 3.7 If there is no censoring we have = = 1, i — l,...,n. In this case 

M v = —Id and thus for < u < s < 1 



7T (h + {M v M v YdH{v] 

(u,s\ \ 

In particular, in this case 

v T0 (s) = w n (s) - (1 - s) 



7T (1 - dH(v))I d = exp(H(s) - H(u))I d = ^—?-I d . 

(u,s] 1 — U 



TQ,s) 



w n (u) 1- s 
-^du = W n {T ): 



l-u) 



1-ro 



TQ,S) 



l-s 

l-u 



dw n (u). 



After noting that Z*/3(r) = F~^(r), and thus I{X, < Z*^(u)} = I{F Y \z{Xi\Zi) < u}, 
straightforward but tedious calculations show that for 5i = 1 



[0,s) 



dw n (u) 
l-u 



1 n 

i=l 



which gives 



1 - 

vr (s) = --j2un^<^p(s)}-s). 

8=1 



Thus the representation in (4.4) corresponds to the Bahadur representation of regression 
quantiles in the completely uncensored case [see e.g. Koenker and Portnoy (1987)], and the 
proposed procedure is asymptotically equivalent to classical quantile regression. 



10 



4 Penalizing quantile processes 



In this section we will discuss several aspects of penalization for quantile processes. For this 
purpose we need some additional notation and assumptions. Let || • || denote the maximum 
norm in an Euclidean space. For a set J = {ji, C {1, ...,d} with ji < j 2 < ... < jj 

define 

&n = (fyiij e J}), , d 

as the vector obtained from /3, where components corresponding to indices j ^ J are set 
to zero. The vector f3^> = (f3j 1 , fijjY is defined as the vector of non-vanishing compo- 
nents of (3^\ Finally, introduce the matrix Vj that corresponds to mapping coordinate ji 
to coordinate I (I — 1, J) and the remaining coordinates to J+l, d (in increasing order). 



Assume that the penalization in (2.6) satisfies the following assumption (here V(A) denotes 
the power set of A) 

(P) There exists a (set-valued) mapping x '■ [ t l,tu] ^({l; d}) such that fikij) = 
for all k G x( r )° \ T £ [ t l, tjj] and additionally 

v^An.o := \/ninf inf — ^ — r A oo, (4.1) 

j kd X (rj) C PkKn.Tj) 

A njl := sup sup — n = o P (l/\/n). (4.2) 

j kex(rj)Pk{ n , r j) 

Moreover, there exist real numbers tl = Q\ < ■■■ < Qk — T u such that \ is constant on 
intervals of the form [9j, 9j + i),j = 1, K — 1. 

A more detailed discussion of various penalizations satisfying condition (P) will be given in 



Sections 4.1 and 4.2 In particular, in Section 4.1 we will provide conditions which guaran- 



tee that the adaptive lasso penalty in (2.7) fulfills (P) and discuss what happens if those 
conditions fail. Alternative ways of choosing the penalty that do not suffer from the same 
problem and additionally allow to investigate the impact of covariates on multiple quantiles 
will be considered in Section 14.21 

For the results that follow, we need to strengthen assumption (Dl) to 



(DF) With the notation (3.2) we have 



sup |K(b)|| + sup |K(b)|| = P {n l ' 2 ) 

beM d beM d 

Strengthening (Dl) allows us to replace assumption (C4) by the weaker, and more real- 
istic, version [note that for any J C {l,...,d} we have A min (E[(Z( J ))(Z( J ))*/(Z'b|Z)]) > 
A min (E[ZZ i /(Z'b|Z)]) due to the special structure of the matrices]. 
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(C4') We have for the map x from condition (P) 

inf A min (E[(Z^)))(Z^W)) t /(Z*b|Z)]) =: A > 

beB(T,e) 

where X m i Q (A) denotes the smallest eigenvalue of the matrix A. 



Remark 4.1 As discussed in Remark 3.2 , the statement of (Dl) can be viewed as a Glivenko- 
Cantelli type result for an empirical process indexed by a VC-subgraph class of functions. 
Similarly, (Dl') follows if the same class of functions satisfies a Donsker type property. Re- 
sults of this kind have for example been established for /3-mixing data. More precisely, 



Corollary 2.1 in Arcones and Yu (1994) shows that (Dl') holds as soon as the /3-mixing 



coefficient (3 r satisfies {3 r = o{r k ) for some k > 1. 

Remark 4.2 The results that follow continue to hold if we strengthen assumption (C4') to 
(C4) and replace (Dl') by (Dl). The details are omitted for the sake of brevity. 

We now are ready to state our first main result, which shows that under assumption (P) on 



the penalization, the estimate defined in (2.6) enjoys the a kind of 'oracle' property in the 



sense of Fan and Li (2001). More precisely, with probability tending to one the coefficients 



outside the set x( r ) are se t t° zero uniformly in r and the estimators of the remaining 
coefficients have the same asymptotic distribution as the estimators in the sub-model defined 

by x(r). 

Theorem 4.3 Assume that tq = tl > 0, that for some a > we have P(C > Z*/3(ro + a)) = 
1 and let assumptions (C1)-(C3),(C4'), (Dl'), (D2)-(D3) and (P) hold. Then we have as 
n — > oo 



P( sup sup \0 k {r)\ =0) -> 1. 



Moreover, 



M/3(r)) - im(I3(t)) = M t , x v Tl (t) + 0P (1/Vn) 
uniformly in r G [tl, tjj] where 



(4.3) 



(4.4) 



v T (s) := w 7 



(s) - [ ( 7T (h + {M^M^dH^))) 1 M u , x M u , x w n {u)dH{u) 

J[t ,s) V («' s 1 V ' ' 



7T denotes the product-integral [see 
defined by 



Gill and Johansen 



(1990)], the matrices M. T)X ,M. T)X are 



M~\ \ 



M T , X := »'(P(t))V^ t) ( K ^ ° )v x(Th M T , X := pl{p{r))V^ T) ( ^ T f r) n , r x{r) , 
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andM TtX{r) := E[(ZW r »)(ZW T )))7(Z*/3(r)|Z)]. In particular, this imph 



les 



MK) - 0(0) ^ ^ ( ^ !! )^(-)^-A,x(-) 



(4.5) 



in t/ie space (-D[i"i, T[/]) equipped with the supremum norm and ball sigma algebra [see Pollard 



(1984)]- Here V T0 denotes a centered Gaussian process given by 

^r , x (r) = W(r) - / ( 7T (l d + (M^M^ydHiv^YMu^Mu^W^dHiu). 

The asymptotic representation of the limiting process above is quite complicated. We now 
give a brief discussion of some special cases where it can be further simplified. 

Remark 4.4 If there is no penalization, then x(r) = {1, d} and V x i T ) and -M TjX both are 
equal to the dxd identity matrix and -M T , X = ft! (f3{r))(ij! (fiij))) -1 . In this case, an analogue 



of Theorem 4.3 is obtained from Theorem 3.4, but without the rate on the remainder term. If 



only the first k < d components are important, i.e. if x( r ) = {lj •••> k} for r G [tl, Tu), V x (j) 
has a k x k identity matrix as the left upper block and the remaining entries are zero. The 
same holds for Ai T . Thus in this case the asymptotic distribution of the first k components 
would be equal to the distribution in a smaller model where only those components are 
considered. This means that the proposed procedure has a kind of 'oracle property'. 



Remark 4.5 Under additional regularity assumptions, similar results can be derived for the 



version of the estimator starting with tq = tl = [see Remark 2.1 . The technical details 
are omitted for the sake of brevity. 

4.1 Adaptive lasso penalization 



Recall the definition of the penalization in (2.7) and assume that for some J C {1, ...,d} 

inf inf |&(r)|>0, sup sup \/3 k {r)\ = 0, (4.6) 

kej t&[t ,t v ] keJ c tE[to,tu] 

then the following statement is correct. 



Corollary 4.6 Assume that the conditions of Theorem 4-3 are satisfied and that (4-6) and 

y/n\ n — > 0, n\ n — > oo (4.7) 



hold. If the the preliminary estimator (5 in (2.1) is uniformly consistent with rate Opilj \fn 



on the interval [t ,T{/] ; then the penalization (2.1) satisfies (P) with %(r) = J. 
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The result shows that the adaptive lasso is \fn consistent under the assumption (4.6). It is 
of interest to investigate if a condition of this type is in fact necessary for the optimal rate 
of convergence. The following result gives a partial answer to this question and shows that 



the optimal rate cannot be achieved by the adaptive lasso defined in (2.7) if some of the 
coefficients of the quantile regression change their sign or run into zero as r varies. More 
precisely we provide a lower bound on the uniform rate of convergence of the estimator which 
turns out to be larger then n~ x l 2 in quantile regions where coefficients are 'close' but not 
exactly equal to zero. For a precise statement we define the sets [the dependence on n is 
suppressed in the notation for the sake of brevity] 



1/4 i A — " ■> " L~ 

\PAr)\ > J 



Pi ■= {r€[7tM\- 1 ^ V r-<mr)\<^}, 

Bj := |r G [t l ,tu} 

Sj ■= {re[r L M \ >| j 9 i (r)|>Qj, 

Vj := {rG[r L ,r c ,]|/5f i (r) = 

Remark 4.7 Basically, the sets defined above reflect the different kinds of asymptotic be- 
havior of penalized estimators. The sets Bj correspond to values of r with j'th coefficients 
being 'large enough', such that they are not affected by the penalization asymptotically. In 
contrast to that, coefficients (3j(r) with r e Sj are 'too small' and will be set to zero with 
probability tending to one. In particular, this implies that the order of the largest elements 
in the set {|/3j(t)| : r G Sj} will give a lower bound for the uniform convergence rate of the 
penalized estimator. Finally, the set Pj corresponds to 'intermediate' values that might be 
set to zero with positive probability. 

In order to state the next result, we need to make the following additional assumptions 
(C4*) Define the map f : [t ,Tu] -)■ V({1, ...,d}) with f (r) := {j : \pj(r)\ ^ 0}. Then 

inf A min (E[(Z^))(Z«W)) t /(Z t b|Z)]) =: A > 

where A m i n (74) denotes the smallest eigenvalue of the matrix A. 
(Bl) y/n\ n = o(l),n\ n -> oo, ^/nn n X n ->■ 1 

(B2) The set PU S with P := UjPj, S := UjSj is a finite union of intervals and its Lebesgue 
measure is bounded by C^i^^j for some positive constants 7 < 00 and a finite 
constant C 1 . 

(B3) c n -> 00, Kn^K^c- 1 -> 00, nV*cl +1 /Kl +l/2 = o(l). 
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(B4) The preliminary estimator is uniformly consistent with rate Op{l/ yjn). 

Remark 4.8 Assume that A n ~ n~ b for some b G (1/2,1) and c n ~ log(n) (it will later 
become apparent why choosing c n to converge to infinity slowly makes sense). Then K n ~ 
n 6 - 1 / 2 , X n n 3 / 4 K n /2 ~ n^' 2 and n 1 '* / kI +1/2 ~ n (i+7-*>(7+2))/2. Thus condition ( B3 ) will 

hold as soon as \ V < b < 1. 

Remark 4.9 Condition (B2) places a restriction on the behavior of the coefficients (3j(r) 
in a neighborhood of {t|/3j(t) = 0}. Essentially, it will hold if no coefficient approaches 
zero in a 'too smooth' way. If for example the function r (-> (3j(t) is times continuously 
different iable, (B2) will hold with 7 = 1/a where a is the smallest number, such that the a'th 
derivative of Pj(r) does not vanish at all points 9 with (3j(6) = for some j. In particular, 
in the case 7 = 1 this property means that /3(t) crosses zero with a positive slope. The 



results in Remark 4.8 show that X n ~ n~ b for any b G (1/2, 1) is allowed when c n = logn. 
If /3(t) runs into zero more smoothly, which corresponds to 7 < 1, the conditions on the 
regularizing parameter A n become stricter since now only | V < b < 1 is allowed. 

Theorem 4.10 Assume that conditions (Cl)-(CS), (C4*), (DV), (D2)-(D3), (B1)-(B4) 
hold. Then adaptive lasso estimator obtained form the penalization (2. 7| ) satisfie 



ws 



sup \\fa)-p( u )\\=0 P (- 1 £—). (4.8) 



re[T L ,Tu\ -~Kn n 1 / 4 * 

Moreover, for any fixed I C [r L , Tu]\(S U P) 



n(/?Q -/?(•)) -+ ( Mr ;f (T) °)^(r)V^(0 (4.9) 



in t/ie space D(I) d where the process V T0 ^ zs defined in Theorem 4-3 and 



P( sup sup |4(r)| = 0) 1. (4.10) 

j=l,...,d reSjUVj^lTL^u] 



Note that the assertion (4.10) implies that the uniform rate of (3 is bounded from below 
by n _1 / 4 /t n 1 ^ 2 c^ 1 as soon as the set S U P is not empty. Since c„ is allowed to converge to 
infinity arbitrarily slow, we obtain the lower bound 0(ra~ 1 / 4 K„ 1 ^ 2 ) = 0(\l/ 2 ), which depends 
on A n and is always slower then 1 / y/n. We will demonstrate in Section [5] by means of a 
simulation study that this inferior property of the adaptive lasso can also be observed for 
realistic sample sizes. 



Remark 4.11 Theorem 4.10 also contains a positive, and at the first glance probably sur- 
prising, result. Since the procedure used to compute the estimators is iterative, one might 
expect that a non-optimal convergence rate of the estimator at one value of r should yield the 
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same lower bound for all subsequent quantile estimators. However, the above results imply 
that this is not always the case. The intuitive reason for this phenomenon is the following: 
the estimators /3(r) only enter the subsequent estimating equation inside an integral, see 



equation (2.6). Thus, when the rate is not optimal on a sufficiently small set of values r, 
the overall impact of a non-optimal rate might still be small. In particular, this is the case 
under conditions (B2)-(B4). 



Remark 4.12 The results in the above Theorem are related to the findings of Potscher and 



Leeb (2009) which demonstrate that penalized estimators do not have optimal convergence 
rates uniformly over the parameter space. This also suggests that using other point-wise 
penalties such as for example SCAD will not solve the problems encountered by the adaptive 
lasso. Instead, using information from other quantiles is necessary. 

4.2 Average penalization 

As we have seen in the last section, the traditional way of implementing the adaptive lasso 
will yield sub-optimal rates of convergence if some coefficients cross zero. Moreover, this 
method will perform a 'point-wise' model selection with respect to quantiles- a property, 
which might not always be desirable. Rather, keeping the same model for certain ranges 
of quantiles such as for example r G [.4, .6], or even for the whole range, might often be 
preferable. In order to implement such an approach, and to obtain a quantile process which 
converges at the optimal rate, we introduce a new kind of adaptive penalization which has - 
to the best of our knowledge - not been considered in the literature so far . More precisely, 
denote by 7i, Tk a fixed, disjoint partition of [r , Tu] and define 

K 

pi nt (n,r) := ^/{tG^} / \fa(t)\h(t)dt, k = l,...,d (4.11) 

K 

P r x (n,r) := £ J{r G ^}sup |ft(t)|, k = l,...,d. (4.12) 



j= i 

Here, $ is a preliminary estimator which converges uniformly with rate Op{l/y/n) on the 
interval [tq, tu], and h is a strictly positive, uniformly bounded weight function integrating 
to one. In the following discussion we call this method average adaptive lasso. 

Remark 4.13 The above idea can be generalized to the setting where the researcher wants 
to include a whole set of predictors, say (Z k ) k£S , m the analysis if at least one of those 
predictors is important. This can be done by setting 

K 

Pk{n,Tj) := max) I{t G 7J}sup |/3 m (t)|, k G S. 
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Remark 4.14 In the context of uncensored quantile regression, Zou and Yuan (2008) re- 



cently proposed to simultaneously penalize a collection of estimators for different quantiles 
in order to select the same group of predictors for different values of the quantile. While 
such an approach is extremely interesting, it seems hard to implement in the present situ- 



ation. The reason is that the minimization problem (2.6) is solved in an iterative fashion 



and dealing with a penalty that affects all quantiles at the same time thus is problematic. 



The following result follows 

Lemma 4.15 Assume that there exist sets J7i, ■■■,<Jk C {1, ...,d} such that 
inf inf sup |/3fc(r)| > 0, sup sup sup |/3fc(r)| = 0, 

j=l,...,K kejj reTj j=X,...,K k£j c reTj 



(4.13) 



and (4-7) hold. If the the preliminary estimator (3 is uniformly consistent with rate Op(l/ \/n) 



on the interval [tl, w\ then the average penalties defined in (4-11) and (4-12) satisfy (P) with 
X(t) = Jj forreTj. 

The above results imply that the problems encountered by the traditional application of 
adaptive lasso when coefficients cross zero can be avoided if average penalization is used. 
Another consequence of such an approach is that predictors which are important for some 
quantile r G Tk will be included in the analysis for all quantiles in Tk- At the same time, 
covariates that have no impact for any r G Tk can still be excluded from the analysis. Finally, 
by taking T\ = [r , tjj] it is possible to achieve that all covariates that are important at some 
quantile in the range of interest will be used for all r G [to,7[/]. As a consequence, average 
penalization is a highly flexible method that can easily be adapted to the situation at hand. 



5 Simulation study 

In order to study the finite-sample properties of the proposed procedures we conducted a 
small simulation study. An important practical question is the selection of the regularizing 
parameter A n . In our simulations, we used an adapted version of X-fold cross validation 
which accounts for the presence of censoring by using a weighted objective function. More 
precisely we proceeded is two steps. In the first step, weights were estimated as follows 

1. Compute an unpenalized estimator based on all data, denote this estimator by b. 



2. For each grid point r, following Portnoy (2003) define weights Wj(r) through 



t 0j {T) := 5 3 + (1 - (/{X, > Z*b(r)} + /{X, < Z<b(r)}f-^ 

\ 1 Tj 
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Here, rj denotes the value of r at that the observation Xj is 'crossed', that is 

u 



r-j :-- 



1 if X, > 7$>{t v ) 

rt{T h \7$(Tk~i) < X 3 < Zjbfa)} if 6 j = 0,Xj < Zjb(Tu) 
if 6j = l,Xj < Z|b(rf/) 



Note that Portnoy (2003) used the weights Wj(r) to define a weighted minimization problem 
to account for censoring. The basic idea corresponds to the well-known interpretation of 
the classical Kaplan-Meier estimator as an iterative redistribution of mass corresponding to 
censored observations to the right. After obtaining preliminary estimators of the weights, 
the second step was to select A as the minimizer of the function CV(X) which was computed 
as follows. 

1. Randomly divide the data into K blocks of equal size. Denote the corresponding sets 
of indexes by Ji, Jk- 

2. For k = 1,...,K, compute estimators b( Jfe,A ) based on the data (Z«, Xj, ..., n }\j k 
and penalization level A. 

3. Compute 

K N T (n) 



CV(X) := J2 E E {*i(Ti)Pn&i ~ Z ;b (Jfc ' A) ) + (1 - t&iMWX 00 - V^>- h - X ' 
fe=l ieJfe i=l 



where X°° denotes some sufficiently large number (we chose 10 3 in the simulations). 
Select the penalty parameter A as the minimizer of CV(X) among a set of candidate 
parameters. 

The basic idea behind the above procedure is that the weights Wi are consistent 'estimators' 
of the random quantities 



Wife) = 6 % + (1 - 5,) (l{Xi > F^falZO} + I{X % < F T \T J \Z i )} T ——; 
and that the minimizer of the weighted sum 

n 

E { w i(T)Pn( X i - Z » + (! - ^(r))PrAX°° ~ Zjb)) 



V(Xj|Zj) 



is a consistent estimator of /3(r). See Portnoy (2003) for a more detailed discussion. 
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Remark 5.1 At the first glance, it might seem that by redistributing mass to X°° we would 
give higher quantiles more importance since the corresponding quantile curves have crossed 
more observations. However, while it is true that the total value of the sum 

£ £ (MnW&i - z $b (Jfc ' A) ) + (i - ^{n))Pn{x°° - z*b^>)) 

jeJ k i=i 

will be larger for higher quantiles, the magnitude of changes induced by perturbations of A 
will in fact be of the same order across quantiles. In a certain sense, this corresponds to the 
invariance of regression quantiles to moving around extreme observations. 

We considered two models. In the first model, we generated data from 

, ( Ti = (Z i7 2,...,Z im )b + .75Ui 

(model 1) < 

[d = (Z ij2 ,...,Z i>10 )b + .75Vi 

where b = (.5, 1, 1.5, 2, 0, 0, 0, 0, 0)*, Zj j2 , Zi,io are independent U[0, 1] distributed random 
variables and U^Vi are independent A/"(0,1). The amount of censoring is roughly 25%. 
In this model, all coefficients are bounded away from zero and so the local adaptive lasso 
as well as the average penalization methods share the same n~ 1//2 convergence rates. We 
estimated the quantile process based on the grid tl = -15, tjj = .7 with steps of size .01. 
Our findings are summarized in Table [TJ which shows the integrated [over the quantile grid] 
mean squared error (IMSE) and the probabilities of setting coefficients to for the two 
estimates obtained by the different penalization techniques. All reported results are based 
on 500 simulation runs and K = 5 in the cross validation. Overall, both estimators behave 
reasonably well. The average penalization method is always at least as good as the local 
penalization method. It has a systematically higher probability of setting zero components 
to zero and a systematically lower IMSE for estimating the intercept and the coefficient fii- 
The second model was of the form 

/ , , ^ f T i = ( Z i,2, Z ifi )h + Z i 7 (Ui - q) 
model 2) < 

[Ci = (Z ifl ,...,Z ifi )b + 1.5 + Vi 

where q denotes the 30%-quantile of a standard normal random variable, Z it2 , are 
independent, .2 + U[0, l]-distributed random variables, Ui, Vi are independent Af(0, 1) dis- 
tributed, and b = (2, 2, 0, 0, 0). The amount of censoring is roughly 20%. We have calculated 
the quantile regression estimate for the model 

7 

Q T (T i \Z i )=p 1 (T) + Y^P3(T)Zij. 

In this model, the coefficient corresponding to Z it7 crosses zero for r = 0.3. From an 
asymptotic point of view the estimator based on point-wise penalization should thus have a 
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n 


method 


Pi /9a Pa Pi Pb P2 P3 P4 P5 Po 


1UU 


local 

average 


oq n i^fi i n k 171 i c o /in r q n 1 nn 71 7 
00. U lo.o ly.O If.o 10. 4U.t) O.o U.l U.U 

31.1 16.2 18.4 16.6 15.4 40.9 3.4 0.0 0.0 75.8 


zou 


local 

average 


00 1 on s 177 i/io 19 o 1 c ni nn nn 79 
zy.l zU.o !(.( 14. y lo.z 10. z U.l U.U U.U t£.Z 

27.3 19.9 17.1 14.8 13.2 13.4 0.0 0.0 0.0 77.1 


^nn 


local 

average 


918 iqq 1^9 133 13^ 3fi nn nn nn 7fi 7 

zi.o xy.y lo.i 10.0 10.0 o.u u.u u.u u.u iu.i 
20.0 18.4 13.1 13.2 13.4 2.2 0.0 0.0 0.0 80.8 


1000 


local 

average 


20.8 17.2 13.3 12.5 12.3 0.1 0.0 0.0 0.0 80.6 
19.3 16.0 13.0 12.5 12.3 0.0 0.0 0.0 0.0 84.7 



Table 1: Results for model 1. Columns 1-5 show n * IMSE(fij), j = 1, ... ,5, where Pi 
corresponds to the intercept. Columns 6-9 show the probabilities pj of setting the coefficient 
Pj to zero (j = 2, . . . , 5) averaged over all quantiles on the grid. Column 10 shows the average 
probability p of setting coefficients Pq — Pio to zero. Rows with label 'local' correspond to 
(local) adaptive lasso, rows with label 'average' correspond to average adaptive lasso. 

slower rate of convergence in a neighborhood of r = 0.3. First, consider the results in Table|2] 
for the IMSE and the probabilities of setting coefficients to 0. We observe the same slight but 
systematic advantages for the average penalization method with respect to model selection 
properties and integrated MSE. Note that this is consistent with the theory since the range 
of quantiles where the local penalization has a slower rate of convergence is shrinking with 
n. Plotting the MSE of the estimator 0? as a function of r reveals a rather different picture 
[see Figure [l]. Here, the suboptimal rate of convergence of the local penalization and the 
clear asymptotic superiority of the average penalization becomes apparent. 
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Figure 1: n*MSE of the estimate for the coefficient (3 7 as a function of the quantile for 
sample sizes n = 50 (upper left), n = 100 (upper right), n = 250 (lower left) and n = 1000 
(lower right). Solid line: local penalization. Dashed line: average penalization. 
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n 


method 


Pi P>2 h 07 P2 P3 P7 Po 


1UU 


local 

average 


17/1 oq nc; in c nn nn /mq 7fi q 

if .4 y.o y.o iz.o u.u u.u 44.0 10.0 
17.0 9.1 9.6 13.4 0.0 0.0 39.4 79.7 


o^n 


local 

average 


77 si 107 nn nn q 1 s 70 a 
14.0 (./ o.l IZ.i U.U U.U 01. <y.4 

14.0 7.7 8.1 12.3 0.0 0.0 19.8 82.1 


^nn 


local 

average 


i4n 7Q 7Q 198 nn nn 9^ q ro 1 
12.8 7.9 7.9 11.2 0.0 0.0 11.9 87.0 


1000 


local 

average 


13.8 7.4 7.2 13.5 0.0 0.0 17.4 83.5 
12.6 7.4 7.2 12.3 0.0 0.0 8.1 90.7 



Table 2: Results for model 2. Columns 1-4 show n * IMSE(f3j),j = 1,2,3,7, where Pi 
corresponds to the intercept. Columns 5-7 show the probabilities pj of setting the coefficient 
f3j to zero (j = 2,3,7) averaged over all quantiles on the grid. Column 8 shows the average 
probability p of setting coefficients P4 — Pq to zero. Rows with label 'local' correspond to 
(local) adaptive lasso, rows with label 'average 7 correspond to average adaptive lasso. 



6 Appendix: proofs 



At the beginning of the proofs, we give a brief overview of the main results. Several auxiliary 



results are proved in Section 6.1 



A first key result here is Lemma 6.2 which provides 
Moreover, conditions that describe when 
will play a major role in the proof 
— yu(/3(-))) is uniformly close 



some general bounds for fi k (f3(Tj)) — n k {(3{Tj)). 
coefficients Pk are set to zero are derived. Lemma 
of the subsequent results. Lemma 6.4 shows t 
to ^/n(^(P(■)) — (f) n {jj)), which in turn is obtained as the solution of an iterative equation. 
Thus the asymptotic distribution of the two aforementioned quantities coincide. We will 



then proceed in Lemma [6.5| to derive an explicit, i.e. non-iterative, representation for the 
quantity y/n(fi(P(-)) — n (rj)). This will yield a Bahadur representation of the process 
\/n(li(P( )) — which in turn is the main ingredient for establishing the representation 

for y/n(/3(-) — /?(•)). Since the proofs of the results in Sections [i] and [i] are similar, we only 
give detailed arguments for the results in Section [4] [which are more complicated] and briefly 
mention the differences where necessary. 



6.1 Preliminaries 

We begin by stating some useful technical facts and introducing some notation that will be 
used throughout the following proofs. 

Remark 6.1 

(1) Under condition (C3) it follows that, for any bi,b 2 G B{T,e), ||/i(bi) — yu(b 2 )|| < 
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C 2 ||bi - b 2 || with C 2 := dC 2 z K f and ||/2(bi) - /2(b 2 )|| < C 3 ||bi - b 2 || with C 3 := dCftf/. 



(2) Condition (C4') implies the inequality 

||/xW r ))(bi x(r)) )-/xW r ))(b? (r)) )|| > 



(b (xM) )b (xW) ) || b (xM)_ b ^: 



lx(r)| 



(xW)i 



where £ r (bS x(r)) , b^ x(r)) ) := A({ 7 € [0,1] : || 7 bS x(r)) + (1 - 7 )bj, x(r)) - /3(t)\\ < e}) and A 
denotes the Lebesgue measure (a sketch of the proof is below). In particular, the above 
equation implies that for all b with Hb^ 1 "^ — /3(r)|| < -^jj with C\ := d/\ it holds that 

|| b C*«> _^( T )|| < ^y^\b^ T)) )-^\P(r))\\. 

For a proof of the inequality above, note that 

ijiiis; j ) - ^iiii^(b^) -pM(bP)\\ > (bS j) -b^)*(^)( b ^) -^)( b ^)) 

= E[(Z (J) )*(bS J) - b^ J) )(F(Z'b[ J) |Z) - F(Z*bj, J) |Z))] 



= E 



(Z(^(bi J) - b( J) )(Z(^)*(bS J) - b( J) ) jf 1 /(Z'( 7 b[ J) + (1 - 7 )b^)|Z)rf 7 ] 



(bS J) - b^) t E[(Z( J ))(Z( J ))V(Z*( 7 bS J) + (1 - 7 )b( J) )|Z)j (b< J) - b^dj 

(3) For ||b^W) - i3(t)\\ < e we have 

/i(bW T )»)-^(r)) = M T (ji MT)) (b^) -^ WT)) (/3(r))) +D T (b), 
/i(bW^)-/i(/?(r)) = JW r (/}W r »(b (x(r)) ) -^W r ))(/3(r))) + D T (b) 

where sup r ||D T (b)|| = 0(||b - /3(r) || 7 ), sup r ||Z> T (b)|| = 0(||b - /3(t)||t). Introduce the 
notation 



V(a) := sup sup ||A-(b)||, V(a) := sup sup ||A-(b)||. 

re[T L ,Tu} \\b-l3(r)\\<a Te[T L ,Tu] ||b-/3(r)||<a 



(6.1) 



(4) Assumptions (C2)-(C4) imply the existence of finite constants C 5 , C 5 such that for any 
||b(*«) -/3(r)|| < e we have 



||/i(b x(r) ) -//(/3(r))|| < C 5 ||// (x(r)) (b (x(r)) ) -P,M t »(P(t) 
\\fl(b^) - mr))\\ < C 5 \\^\b^)-^)) {f3{T) 



(6.2) 
(6.3) 
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Lemma 6.2 Let J C {1, d], e M. d ' J ' and consider the problem of minimizing Hj(Vj 1 (h t , 0*)*) 
with respect to h 6 IR' J L Denote the generalized solution of this minimization problem by 
h(Tj) and set = Pj^hfo)*, 0')*. Then 

HkiP) ~ M/3 (J) (^)) + »n,S) ~ [ K,Mu))dH{u) - I fi k 0(u)) - fi k ((3(u))dH(u) 

J[ro,Tj) J[ro,Tj) 



— - y^(Zi,fe — EZj i;) 



i=l 



- -^ + ^ + ll^ (J) (^))-^ (J) (/3 (J) ( 



p k (n,Tj) n 
for k e J. 



Ti 



Now, let conditions (P), (C1)-(C3) and (C4') hold and additionally assume that for some 
J C x{Tj) 



ai + a 2 + sup — r + — + C 2 sup \p k (Tj))\ < 7 ' JJn 

keJPk\n,Tj) n kejC C x V 1 



SUP — ; 

fceJ p k {n,r j/ 



/2C \ 
+ (C 5 + 1) I — + ai + a 2 + C 2 sup \/3 k (rj))\ ) < inf 

V n k£j c J k<EJ° p h {n,Tj) 



A, 



(6.4) 
(6.5) 



where 



sup 



/. n 

(b) + / v n 0{u))dH{u) +\\^Y(Z i -EZ i ) 



l T 0,Tj) 



< on 

< a 2 . 



Then any minimizer of Hj defined in (2.6) is of the form Vj 1 (h(r 7 ) t , 0*)* where h(rj) is a 
minimizer of Hj(Vj l (h t ,0 t ) t ) over h e IR' J L 

Proof In order to simplify the presentation, assume w.o.l.g. that J = {1, L}, that 
inf fc6 jp fc (n, r,) = p L (n, r,) and that sup A . eJ p fc (n, r.,-) = p L +i(n, Tj). Define 



-2^ (//(b) - Mrj)) + u n {b) - I u n (/3(u))dH(u) 



(6.6) 



+2^ / /2(0(u)) - fl(P(u))dH(u) + - V /{X, = ZftK^'Zi + |e*Z f -|) 



fc=i 



+2A„X:(&^fM +/{6t = o} 
f— ' V Pk[n,Tj) 



16 



and note that finding all minimizers of the function i2j-((h*, 0*) ) in (2.6) over h G R is 
equivalent to finding all points b = (h*,0*)* that satisfy 

inf Vj(b,£)>0. 
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For a proof of the first part of the lemma, observe that by simple algebraic manipulations 
and the condition on \1> we have 



2 " 4A 
< ^(b, -e k ) = -^(b, e k ) + - I iX l = Z*b}|e* Z,| + n I{b fc = 0}. 

This directly yields, 

2 ™ 4A 
*i(b,e fc ) < - ^/{X, = Z*b}|elZ,| + -^-I{b k = 0}, 

and by assumption we have < ^(b, e^). From that we obtain for k — 1, L 

- M0fo)) + ^ )fc (/3) - / i>„, fc (/3(«))d#( M ) - / Mfa)) - vMu))dH{u) 

n 

^(Zj,fe — EZj )fe ) 
i=l 

= - ^(b, e k )--J2 1{Xi = Z*b}(^Z iife + |Z i)fc |) - " (s<?n(b fc ) + = 0}) 

< > (h(b,e t ) - i ± I {Xi = Z< b} |Z u | - 2 Mj^| + 2V^bt#0} + c ?) 
2 V Pk(n,Tj) Pk{n,Tj) nJ 

p k (n,Tj) n 

almost surely. A simple application of the triangle inequality completes the proof of the first 
part of the lemma. 

For a proof of the second part, assume w.o.l.g. that J = {1, L} and that the assumptions 
made at the beginning of the proof of the first part hold. In particular, under this simplifying 
assumptions V Tj is the identity matrix. Start by noting that 

^(b,6 + 6) = *i(b,&) + ^(b,6) - ~Y,i{x l = z^Xl^z.l + |^z,| - 1(6 + 

1=1 

-^E ?r T ? (i^i + - + &*!)■ 

fc=l t>^\ n -> T 3> 

In particular, for the special case = (C*, 0^_ L )*, C° = (0^, 6"*)* with (Gl 1 ,^ M d_L , the 
last line in the above equation equals zero. Moreover, \a\ + \b\ — \a + b\ < 2\b\, and thus 
|^Zi| + |^Zi| - + f 2 )%| < 2|£*Z;|. Hence, if we can show that 

on _ j _ 

*;(/W) + *;(ft3)>^ E 1^1 

i=L+l 



n 

T 



i=i 
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for any $,£2 of the form given above, it will follow that \Pj (/?,£) > for all £ G M. d . By 
the definition of $ we have £1) > 0, and thus it remains to verify that ^(/S, £2) — 

I i I ■ To this end, observe that the arguments in the first part of the Lemma yield 



2C Z 



the bound [the last inequality follows under (6.4)] 



|^)(£)_^)(^)( r .))|| < ai + a 2 + 



A, 



< 



e-swp k>L \Pk(Tj))\ 
d V 1 



+ ^ + C 2 sup|/3 fc (r i ))| 
n k >L 



since by assumption (C3), condition (6.4) and Remark 6.1 we have 



l/^ (J) (/%)) 



^)(/3^(r,))||<C 2 sup|/3 fc (r J ))|. 

k>L 



Thus H^W^'^ — /3(rj)|| < e and (6.2) together with the triangle inequality implies that 

A„ Cz 



||M/3)-/i(/3(r,))|| <C 5 «i + a 2 + 



+ C 2 sup| / 5 fc (r i ))| +C 2 sup|/3 fc (r i ))|. 
p L (n,Tj) n k>L ) k>L 



By the definition of $ and the assumption on p k (n,Tj) made at the beginning of the proof 
we have 



a 

2A n (^2,A 



?0 I 
»2,fcl 



. sgn(p k ) - _ m l£ 2 , 

\,k — 7 7 + i{Pk - Uj — ; 

, , Vk{n,Tj) Pk{n,T j/ 



2A n 



\^2,k\ 



> 



Pl- 



^1 v If 



k=L+l FKK ' 3/ x-,* x , j, k=L+ 

Combining all the inequalities derived above, we see from the definition of ^ that 

2a 1 -2a 2 -2|| / u(/3)- / u(/3(r J 



2,fcl 



*i(b,e 2 °) > £ \c 

k=L+l 



2,fc 



2C 2 



3> 



n 



Thus under (6.5) it holds that > 



2(C 5 +l)Cz 



Eie 



2il 



— ^r^Sl£ 2 «l an d we have 

proved that is a minimizer of the function -H'(b) in the set M. d . It remains to verify that 
every minimizer is of this form. We will prove this assertion by contradiction. Assume that 
there exists a minimizer b with h k ^ for some k > L. Since the set of minimizers is convex, 
any convex combination of b and a minimizer $ with $ k = would also be a minimizer. 
Thus there must exist a minimizer b with k'th component different from zero and all other 
components arbitrarily close to the components of $. In particular, we can choose b in such 



a way that ||/i(b) — < 5 z . Setting b = b,£ = ±e k in representation (6.6) we obtain 



a contradiction, since in this case the sum in the last line will take the values ±2A„ 
and the absolute value of this quantity dominates the rest of \I/j(b,£) by construction and 



sgn(b k ) 
Pk(n,Tj) ' 



condition (6.5). Thus a minimizer with b^ 7^ for some k > L can not exist and proof is 
complete. □ 
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Lemma 6.3 Under assumptions (C1)-(C4) and (Dl) the unpenalized estimators obtained 
from minimizing (2.5) are uniformly consistent in probability, i.e. 

sup \\${r) - /3(t)\\ = P {1). 

t£[t l ,tu] 



Proof Define the quantities 

R n ,i ■= C M ( sup |K(b)|| +H(tjj) sup ||P„(b)|| 



I — 
n 



^(Z,-EZ, 



i=l 



op(1), 



r n>1 := C 5 (i2»,i + C 6 b 2 n + ^) and 



Tin ■= [r n ,\ + 



C()b n 

~c7 



sup 



l + C 5 b n ) N ^ =o P (l) 



Use similar arguments as in step 1 of the proof of Lemma 6.4 [set A H) i = 0, A nj i = +oo] to 
inductively show that on the set Q n := ^JZ n < j whose probability tends to one we 
have 



(i) the conditions (6.4) and (6.5) of Lemma 6.2 hold with J = {1, ...,d}. 

(ii) we have the following upper bound 

HM/%0) - KPirM < r n ,i(i + c 5 b n y + 2&((1 + c 5 b n y - 1) =: r n , j+1 

^5 



< r„i + 



C 6 b n 



sup(l + C 5 b n ) N ^ =K n = o P (l). 

n 



In particular, the results above and an application of Remark (LI imply that 

sup W{t 3 )-P{t ] )\\=o p {1). 

j=l,...,N(r) 

Since /3(r) is constant between grid points and additionally (5{t) is Lipschitz-continuous, 
this completes the proof. 

□ 
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Lemma 6.4 Define the triangular array of random M, d -valued vectors 4> n { T j) as 

0n(r o ) - M/3(r )) = ~M T0 (- < Z*/3(r )} - r )J (6.7) 



i=l 



and for j = 1, N T 

n (r,) - = A* Tj f - ^(/3(r,-)) + / u n (P(u))dH(u) + - V(Z, - EZ,) (6.8) 

+ E / M u dH(u) L n ( n ) - fi(P(r t )) 

(a) Let assumptions (C1)-(C4) and (D1)-(D3) hold and denote by /3 the unpenalized esti- 
mator obtained from minimizing (2. Sty . Then 



rasup \\fJt0{Tj)) - <f>n(Tj)\\ = P {n 1/2 b n + n 7/2 + w Cnn -i/a(>/ru>n) + ^ Cnn -i/2{y/ni> n )) 
j 

(b) Let assumptions (C1)-(C3), (C4') } (Dl% (D2)-(D3), (P) hold and denote by (3 the pe- 
nalized estimator obtained from minimizing (2.6) . Then y/nsupj ||M/3( r j))~ 0™( r j)ll = 
o P (l) and P(sup T . swp kex{Tj)C = 0) ->■ 1. 

Proof. The proof of part (a) is similar to, but simpler then the proof of part (b). For this 
reason, we will only state the proof of (b) and point out the important differences where 
necessary. The proof will consist of two major steps. In the first step we define the set 



a 



{n n < } n {(i + c 5 )ii n + c 5 A nA < a„, } n n 0>n 



with fi ,n denoting some set such that P(f2o,n) — ► 1 an d note that P(fl n ) — > 1, here [the 
bound will be proved below] 



K n := fr n>1 + sup(l + Cfc&n)^ = P (n-^) 

and r n ,i := C 5 (p„,i + C 6 b 2 n + ^ + A„,i) with 



Rn,i :=C M (sup 11^)11+^) sup ll^tyll + lpyVZi-EZi) ) =0 P (l/v^). (6.9) 

V beR d beR d " n i=1 ' 

For a proof of (a), proceed in a similar fashion but with x{ r ) = {1? ■■■,d} for all r, setting 
A n i = 0, A n0 = 00 and replacing R n l in the definition above by 



Rn.l ■= C M [ SUp \\u n 
V beB([r i ,r i7 ], £ ) 



n 

(b)\\+H(ru) sup ||z> n (b)|| + |p VCZi-EZi) ) 



beB([T L ,Tu], e ) 
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Here, uniform consistency of the unpenalized estimator [see Lemma 6.3 implies that only 
the supremum over b £ B([tl, w],e) needs to be considered. 

In what follows, we will inductively show that on the set fl n we have for every < j < N T (n) 
[recall that N T (n) is the number of grid points] 



(i) the conditions (6.4) and (6.5) of Lemma 6.2 hold [the quantities acx, a 2 will depend on 
j and be specified in the proof below] . 



(ii) Pkfa) = for k £ x{Tj 



(iii) we have the following upper bound 



\n^0( Tj )) - ^KPirM < r M (l + C 5 b n y + ^((1 + C 5 b n y - 1) =: r nJ+1 



C§b r , 
~C 5 



< (r n> i + 3^) sup(l + C s K) Nr ™ =K n = Op{nW 



In the second step, we will prove the bounds 

SUp " 4>n(Tj)\\ < Sn.1 BU P (1 + dC M b n ) N ^ 



(6.10) 



where s n> x = op(n 1 ^ 2 ) in case (b) and 

= Op{b n + n" (1+7/2) + W Cnft -l/2(l/„) + W Cnn -i/ 2 (i> n )) 

in case (a). 

Step 1: Proof of (i), (ii) and (iii). 

First, consider the grid point tq. Classical arguments yield the existence of a set fio,n such 
that P(fi ,n) ~~ ► 1 an d (ii)-(hi) hold on this set. The details are omitted for the sake of 
brevity. 



Next, observe that for the grid point n we have for k £ {1, ...,d} [apply Remark 6.1 



jikifcu)) - fi k {(3{u))dH{u) < r n>1 + C 6 b 2 n =: R n>2 = P {n~ 1 ' 2 ). 



[to.ti) 



Defining aj := R n j (j = 1,2) we obtain that conditions (6.4) and (6.5) of Lemma 6.2 hold 
with j = 1 on the set 

«l,n ■= { — + Rn,l + Rn,2+An,l < T^TT \ R { i 1 + C ^ (~ + R n,l + Rnfl) + C 5 A„,i < A n , [ • 



Finally, note that by the first part of Lemma 6.2 we have for k £ x( r i) 



2C 

W0(n)) - Mfc(/3(ri))| < i? n ,i + R n , 2 + — + A n ,i 

n 
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[the constant 2 in front of Cz will play a role later] which implies (iii) on the set fii ))t . 
Now, proceed inductively. Assume that (i)-(iii) have been established for 1, For the 

grid point Tj +1 , observe that for k G {1, d} 

! fa(P(u)) - fa(p(u))dH(u) < R n , 2 + b n J2(^r n ,i + C 6 b n ). 

</[t ,t j + i) i=1 



Thus, setting a\ = Rn,i, CX2 '■= R n ,2 + b n Y^i=i{Csr n ^ + Cob n ) we obtain that conditions (6.4) 



and (6.5) of Lemma 6.2 hold on the set 

C 



Qj+l,n '■= | + R n ,l + R n ,2 + b n ^^(Ce&n + Csr + A n< i < — - — j 



n 



n{(l + C 5 ) (— ^ + R nA + R nt2 + b n J2(C 6 b n + C 5 r n ^ + C 5 A nA < A n>0 }. 
This yields (i) and (ii) for Tj+i on the set Qj + i %n . Finally, note that by the first part of 



Lemma 6.2 we have for k G x( T j) 

j 

W{K T i+i)) - M/5( r j+i))l < r n,i + b n ^2(C 5 r n>i + C 6 b n ). 
Inserting the definition of r n ^ for k = 2, j, some algebra yields 

r n ,i + b n y^{^ r n,j + c eb n ) = r n>1 (l + C 5 b n ) J + C 6 b n 1 + C, ^ n 1 = f n j^ j . 



which completes the proof of (iii) for Tj+i. This shows Q n C njfi J)n and completes the first 
step. 



Step 2: 



First of all, note that (iii) from the first step in combination with Remark 6.1 shows that 

supH^O-^^OIHOp^- 1 / 2 ). (6.11) 



In order to establish (6.10), note that on the set f2 n Lemma 6.2 in combination with Remark 



6.1 yields 



< \\M T] ( - VnWi)) + / u n (P(u))dH(u) + ^ V(Z, - EZ, 
+ J2 [ M T] M u dH{u) (0 n (rO - n{p(r$) 
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„ n 

+M Tj (u n 0(Tj)) - / u n 0(u))dH(u) - ^ V(Z, - EZO 
v Mrs) n ~[ 



jl(f3(u)) - jl(f3(u))dH(u) 



n 



+ 

+Ka + 



Now for n large enough and c n — > oo we have by (6.11 ) 



Vntffa)) + / uMu))dH(u) + (vMtj)) - / uMu))dH{u] 



< K 



where := (f n ) + H(T U )ui Cnn -i/2(v n ) and moreover [here, P(t) is defined in (6.1)] 



/}(/?(«)) - p,(P(u))dH(u) - / M u dH(u)(ji(/3(Tj)) - /^(r,))) 

'[ T j: T j + l) "'[ T j. T J + l) 

< (V(TZ n ) + db n C M C 7 )(H(r j+1 ) - H{r 3 )). 



In particular, this implies 



j'-i 



- ji(p(u))dH(u) - / M u dH(u) ( n (/^)) - M/?^)) 

fajlTj - ) i=0 "'[ T i:' r i + l) 



< 



H(ru)(V(n n ) + db n C M C 7 ) + db n C M HM/fe)) 



- n (Ti 



i=0 



Summarizing, we have obtained that for j > on the set O n 



Cz 

n 



3-1 



< A nA + — + V n + V{K n ) + H{ Tu ){V{n n ) + db n C M C 7 ) + db n C M V HMfe)) 
Defining 

Sn,l ■= An,l + — + v n + V{Tl n ) + H(Tu)(V(n n ) + db n C M C 7 ) 

n 



[Ti 



i=0 



we obtain ||//(/3(rj + i)) — n (rj + i)|| < s n j + i. Moreover, induction yields 

Snj+i = (1 + dC M b n ) j+1 s ntl < s nA sup(l + dC M b n ) N ^ n \ 

n 

This completes the proof. 



□ 
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Lemma 6.5 Under the assumptions of Lemma 6.4 we have for j = 0, ...,N T (n 

= M Tj (w n (Tj 

uniformly in j where for some finite constant C we have sup,,- ||i? n (rj)|| = Op{r n ) with 



4 7T (I d +(M v M v ydH(v) )) M u M u w n (u)dH(u) )+R n {r j ) 

[r ,r 3 ) M^] V ' ' ' 



C([b n + sup \\M U - M v \\ + \\M U - M v \\ ) sup \\w n {u) 



+ sup \\w n (u) - w n (v)\\ j. 



\u—v\<a n 



Id denotes the d x d identity matrix, 7T denotes the product-integral [see Gill and Johansen 



v n {u)dH{u). 



(1990)] and we defined 

n „ 

Mr) ■= - 5"(Z, - EZ t ) - v n (r) + / 

n i=l -W 

Proof. Throughout this proof, denote by C some generic constant whose value might differ 
from line to line. Start by noting that the solution of the iterative equation (6.8) is given by 

3 , 3 



A*C8(r i+1 )) = M Tj+1 J2( l[(l d + [ (M n M u YdH(u 

1=0 i=i+i J[n,n +1 ) 



(w„(r/ + i) - w n (ri)) 



+- M ^(n( / ' 

i=0 



d + 



(M n MuYdH(u) ))w n (r ), 



[Ti,n+i) 



this assertion can be proved by induction [here, a product Yli= a Cj with a > b is defined as 
the unit matrix of suitable dimension]. Next, observe that summation-by-parts, that is 



fkiOk+l — 9k) — fri+\9ri+l ~ fm9m — (/fc+1 ~~ fk)9k+l 



k=m 



k=m 



yields 



E(IK'- + 

-1 



(MnMufdH^u) J J (w B (7j + i) - UJ n (ri)) 



i=0 i=Z+l 



j r j , 

IdW n {r j+l ) - 22 I II ( /d " 

1=0 i=l+2 



[n,n+i) 



-n('< 

wn^+i) + ( n + 

Z=0 i=l+2 



(M n M u y dH(u) jj w n (r ) 
(M n M u ydH(u 

[n,T i+ i) 

\ -it 

{M Ti M u ydH(u) w n (r l+1 ) 



{M Ti M u ydH(u] 



n,n+i) 



M u M n+1 dH(u)w n {r^ 



[r !+ i,r !+2 ) 
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At the end of the proof, we will show that 



sup 

k,j,j<k 



k-l . 

TT + / (M n M u YdH(u)) - IT (h + (M u M u ydH(u) 
T=j J[n,T i+ i) ' v 



< Cd r , 



(6.12) 

where d n := b n + sup lu _ vl < anMuMVk (\\M U -M V \\ + \\M U - M v \\) . Moreover, we note that 



sup sup 

k,j,j<kv£{T j ,T j+1 ] 



7T (l d + (M u M u ydH(u)) - 7T ( I d + {M u M u ydH(u) 

(Tj,T k ] V / (v,Tk\ V 



<Cb r . 



since 



7T {aM (l d + (A4 u A4 u )*dtf(7i)) < exp(dC M (H(b) - H(a))) by inequality (37) from 



Gill and Johansen 
H(v))exp(dC M {H 



(1990) and 



7T 



/ d + (M u M u )*rfif(n; 



< dC M {H{r j+1 ) - 



)) by inequality (38) from the same reference. This yields 



i-i i 

su p||e( n 

i=0 i=/+2 



[n,Ti+i) 



(MuM u ydH(u)w n (ri +1 ) 



, TT (l d +(M u M u ydH{u))) M v M v w n (v)dH(v) 

[76,75+1) V (^ j + 1 ] V 7 7 



[t; + i,t; +2 ) 
f 



< Crv 



t _ 



since J [t ^ ti) ( 7T(„ iT1 ] [I d +{M u M u ydH{u) ) ) A^A^u^^c/F^) < C6„sup u ||w n («)ll- Thus 



it remains to establish (6.12). To this end, we note that 

k-l 



]J(l d + [ (M Ti M u ydH(u)) -i[(l d + [ (M u M u ydH(u 
i=j J[n,n+i) ' i=j J Fi.Ti+i) 

k-l i-i „ 

E(n( /d+ / (M n M u ydH(u) 
i=j i=j J[n,n+i) 



h ,7-i+i ) 



xi / {M n M u ydH{u) 

[7J,7;+i) 
fc-1 „ 

x J] ( J ^+ / (M u M u ydH(u 
i=i+i J[n,n +1 ) 



{M u M u ydH{u 



Next, observe that 



sup 

h0 k i[TuT l+ {f1k 



(M n M u ydH(u)- / (M u M u ydH(u) 

h,n+i) J[ T i,n+i) 



< Cb n sup \\M U 

\u— v\<.a n fi k (fi\u,v\ik 



and 



sup 

l:3k:e k &[ri,Ti +1 ) 



(M Tl M u ydH(u) - / (M u M u ydH(u) 

[t;,T !+ i) J[n,r l+1 ) 



< Cb n . 
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Finally, note that k — 1 — j < N T (n), b n N T (n) = 0(1) and that Jj r; ^ ^ M n M u dH{ 



u 



< 



Cb r , 



j [Tun+i) M Tu M u dH{u) 



< Cb n uniformly in Z, which yields 



sup 



l[(ld+ (M n M u ) t dH(u)') - nfc+ / (M u M u ydH(u 

i=j J[n,T i+ i) J i= j J[Ti,n + i) 



< Cd r . 



since there are only finitely many different 9%. Finally, the bound 

fc-i „ 

sup ||TTf^+ / (M u M u YdH(u)) - 7T (l d + (M u M u YdH{ 

k,j,j<k II t~ V ■/[T i ,7i + i) y ( r ^ T *l V 



< CcL 



can be established by using equations (37), (39) in Gill and Johansen (1990) and the repre- 
sentation 



1=3 

k-l l-l 

E(n('< 

1=3 l =3 



[T»,Ti+l) 



rr (i d + [ (M u M u ydH( U )) - n tt (i d + {M u M u ydH{ 

x J[n,r i+1 ) ' f4 (Ti,ri+l]V 

(A*JW„)<dff( U ))) X 

Id + / (M u M u ydH(u) - 71 (l d + (M u M u ydH(u) ) ) < 

7T (l d + (M u M u ydH(i 

n+i,Tk] \ 



This completes the proof. 



□ 
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6.2 Proof of Theorem 13.41 and Theorem 14.3 



The convergence P(sup Tg [ r0iT!7 ] sup fegx(r )c ||/7fc(r)| = 0) — > 1 is a direct consequence of the 
results in Lemma [6.41 

Next, observe that sup^ sup ug ( T ^ \\n(fl(u)) — jj,{f3{jj + i)) \\ = 0(b n ) and similarly 
sup sup \\ij n (u) - ij n (r j+1 )\\ = 0(b n sup \\w n (r)\\ + u an (w n )) a.s. 

j ue(Tj,Tj +i ] T 



where we defined 

^n(r) := w n (r) 



[r ,r) V (".^1 



7T (l d + (MvMvYdHtv))) 1 M u M u w n (u)dH(u). 



Together with the results in Lemma OA and 6J3 this yields the representation 

1*0(8)) ~ 

= M s («;„(*) + / ( 7T f I d + (A^ v Af„)*dif(u)))*Al u A^ u if; ri (ii)dir(u)) + J^(s) 

V i[r ,s) V M V J J J 

uniformly in s G [r , Tf/] where 

sup ^/n\\R n (T)\\ = P (n 1/2 b n + n~ l/2 + w Cnn -i/ 2 ( v / ^^n) + (Vnz?„)) 

-re [t l ,t v ] 



n\\R n {T)\\ = op(\) under the as- 



under the assumptions of Theorem 3.4 and sup 
sumptions of Theorem 4.3 Thus we have obtained representation (4.4), and a Taylor ex- 
pansion combined with some simple algebra yields (3.3). 

The weak convergence statements in both Theorems follow by the continuous mapping the- 
orem [note that by assumption (A3) and equation (37) from Gill and Johansen (1990), the 

components of the matrix ( 7T( W)T ] (ld+{M v M v ) t dH(y)\ j M. U M. U are uniformly bounded], 
and thus the proof is complete. □ 
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6.3 Proof of Theorem liTlO 

The following result can be proved by similar arguments as Lemma 



6.2 



Lemma 6.6 Let conditions (Cl)-(CS) and (C4*) hold. Assume that K C £(rj) satisfies the 
following conditions 



sup 



^n(b) 



i> n (P(u))dH(u) 



TO 

n 



£(z. 



EZ,- 



i=i 



jl0(u)) - fi(P(u))dH(u) 



< a 2 



and 



ot\ + ct 2 + sup 



A, 



fee*" Pk{n,Tj 
2C 



+ ^ + C 2 sup | A ( Tjf ))|<^_ ra P^I^I 



(C , 5 + l)(ai + a 2 + — + sup Iftfa))]) +C B sup 

n k&K c ' kGK 



A, 



keKPk{n,Tj 



Ci 

< inf 

fcexc p k {n,Tj 



A„ 



(6.13) 
(6.14) 



T/ien any minimizer of Hj defined in (2.6) is of the form V K (h(Tj)*, 0*)* where h(rj) is a 
minimizer of Hj (V^ 1 (h* , 



/,W(/3(r,))-^ K ^( 



over h G . Moreover, it holds that 
A,, 



< h sup 

n k£KPk{n,Tj) 



+ C 2 sup |/5 fe (Tj)| + ai + a 2 . 



For the proof of Theorem 4.10 we will consider points Tj such that Tj G f] k (Bk U 14) and 
Tj £ P U S separately. Note that for sufficiently large n, the set P U S is a union of finitely 
many disjoint intervals. Without loss of generality, assume that [TcTjvJ C f) k (P>k U 14) and 
[t Ni+ i, t N2 ] CPUS, [tjv 2 +i, t N3 ] C flfc(-^fc u ^fc) an d so on [°f course, iVi, jV 2 , ... depend on 
n, but we do not reflect this fact in the notation]. 

Introduce the 'oracle' penalty p k (n, Tj) := ooI{(3k( r j) = 0} and define (3°{Tj) as the solution 
of the minimization in (2.6) based on this penalty. The basic idea for proving process 
convergence is to show, that the 'estimator' j3°(Tj) and j3(rj) have the same first-order 
asymptotic expansion uniformly on r» G P U S. More precisely, we will show that 



sup MP(t))-h({3°(t) 
-,-ePus 



Op n 



-1/2N 



(6.15) 



Note that by the arguments in the proof of Theorem 4.3 this directly implies the weak 
convergence in (4.9). 

In order to study the uniform rate of convergence of ${jj) on f] k (B k U 14), we need to 
introduce some additional notation Consider the non-overlapping sets 



A hn := {t : n "W > t > n- l /V /a ^" +1)/H }, J 



l,...,5d- I. 
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Observe that for any r, the components of /3(t) are contained in at most d of those sets and 
thus for any r there exist three consecutive sets containing no component of /3(t). Moreover, 
the diameter of each A,- n is by construction of larger order then n _1//2 . Thus there exists a 
function j(r) such that the probability of the set 

■= {|/3fe(r)| £ Aj(r) )n , k = l,...,d, Te[r L ,Tu}} 



tends to one. We will use Lemma |676] to show that in each step, coefficients with absolute 
value below n -1 / 4 k^ 1 ^ 2 : c ~ <J<Tfc ) +1 )/ M De set to zero with probability tending to one. 
Define the quantities 

An 



sup — 



^ |&(r)| 



Op{\/y/n) 



L n j ■= y/n inf 



A r 



oo. 



W~4 



sup 



A, 



O, 



(6.16) 
(6.17) 
(6.18) 



rePjUBj |/3j(r)| 
and M n := sup.,- M n>j , L n := infj L nJ , W n := sup,,- W nJ . 

Now begin by considering Tj 6 [tq,^]. A careful inspection of the proofs of Lemma 6.4 



Lemma 6.5 and Theorem 4.3 show that the arguments and expansions derived there continue 
to hold and in particular that 



sup M(3{r))-^°{r) 

Tj£[TO,T Nl ] 



Op[n 



and 



Rn,2 



fx((3(u)) - jl(l3(u))dH(u) = P {1/Vn) 



Next, consider Tj G [t^+i, tjv 2 ]- Define the quantities 

A n C^X n 



U„ 



s n,0 



inf 



inf 



rei7i,^]^ 6 P(T),fceS(T) I |/3 fc (r)| |^-(r)| n 1 /^' c , 
(20, + C 2 



C 2 (l + C 5 ) ) 
1/2 (i(r)+l)/5d J ' 



+ + iL 1 + R n . 2 + 



u^kI/ 2 ^'' 



where 



je{l,...,d}: |ft(r)|> 



S(t) := \ je{l,...,d}: |&(r)|< 



c 



-j(r)/5d 



U(r)+l)/5d 



C 



n^K l J 2 



n 



Note that by the assumptions on K n , c n we have that U n is at least of the order Ann 1 / 4 /^ Cn 
which is of larger order then n~ 1 / 2 . In particular, this implies that the probability of the set 

-2C? 



n 



2.11 



{(1 + C 5 ) (-^ + i? nil + R n>2 + (N 2 - N x )b n C z C Ltl (8 nja + 6„C 4 )) < U n } 
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where Cl,i '■= sup n (l + C\C 3 b n ) NT< ^ } < oo, tends to one by assumption (B3) since (N 2 
N\)b n = 0(c n / K n ) J . In the following, we will show that on the set 



S n ,0 



+ (iV 2 - iYi)6„C3C Lil ( Sn ,o + C 4 &„) < 



Ci 



it holds that for I — 0, N 2 — Ni 

0{u))-fL^{u))dH{u) < Y,C 3 b j + C 4 b n ) < lb n C 3 C Lil {s ni0 + b n C 4 ) 



l-i 



[ t jv 1 >''"jv 1 +;) 



3=0 



(r Nl +i) - P(t Ni+1 )\\ < s n> i, 

where s n> i satisfies the relation 

i i 
s n> i+i = sn, + C x C z b n J2(s n , t + CA) = (1 + C 1 C 3 b n ) l+1 s nfi + b\C x C 3 C 4 ^(1 + b n C X C^ 

i=0 j=0 

< CL,is n ,o + CL,ib n C4 — Ct& n . 

Note that the assertion for jx inductively follows from the assertions for j3 and s n j. To estab- 
lish those assertions, start by considering the case I = 0. Let \/3j(r Nl )\ > n~ 1//4 k^ 1 ^ l Cn ^ <yTN ^^ d 
if and only if j G K . By construction, conditions (6.13) and (6.14) hold on the set f2 3 ri 
with K = K Q , a\ = R nj i [with R n , 1 defined in equation (6.9)], a 2 = R n ,2- Thus on Q 3}n 
we have /^(r^J = for k G Kq and by Lemma 6.6 it holds that H/^T/vJ — ^(tatJH < s^.o- 
The rest of the assertion follows by iterating the above argument with oti = R n ,i, a 2 = 
R n ,2 + X^=o G 3 b n (s n j + C A b n ) in the Z'th step. 

This yields the assertions (4.10) and (4.8) on the set [tl, tn 2 ]. Note by the computations 
above 

fi0(u))-fl(P(u))dH(u) < {N 2 -N 1 )b n C 3 C Ljl {s nfi + C 4 b n ) 



[TNi > t n 2 ) 

In particular, this implies that 



O 



O f 



o P {l/y/n). 



l T N 1 ,tn 2 ) 

Thus we obtain 



fi{P°{u)) - ji{p{u))dH{u) 



TN 1 ,TN 2 ) 



Opin 



-1/2^ 



sup |K/3»)-M/?(r) 

Tg[Tjv 2 + l,TJV 3 ] 



o P (n-^) 



by an iterative application of Lemma 6.6, the arguments are similar to the ones used in 



the proofs of Lemma |6.4[ Lemma 6J3 and Theorem |4.3| Finally, since the set P U S is by 
assumption a finite union of intervals, we can repeat the arguments above to extend the 
proof to the whole interval [tl, tu\- n 
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